KANDA DATA

  • Home
  • About Us
  • Contact
  • Sitemap
  • Privacy Policy
  • Disclaimer
  • Bimbingan Online Kanda Data
Menu
  • Home
  • About Us
  • Contact
  • Sitemap
  • Privacy Policy
  • Disclaimer
  • Bimbingan Online Kanda Data
Home/Data Analysis in R/How to Test for Normality in Linear Regression Analysis Using R Studio

Blog

2,515 views

How to Test for Normality in Linear Regression Analysis Using R Studio

By Kanda Data / Date Nov 21.2023
Data Analysis in R

Testing for normality in linear regression analysis is a crucial part of inferential method assumptions, requiring regression residuals to be normally distributed. Residuals are the differences between observed values and those predicted by the linear regression model.

It’s vital to perform a normality test to ensure that the residuals in the regression equation are normally distributed. The most commonly used tests for regression normality among researchers are the Shapiro-Wilk test and the Kolmogorov-Smirnov test.

The assumption of normality in residuals is important as it can affect the validity of the model’s estimation results, as well as the consistency and accuracy of the model’s regression predictions. If the distribution of residuals is not normal, the results of the regression analysis may be biased.

If residuals are not normally distributed, researchers can consider several alternatives, such as increasing the data size, modifying the equation specifications, data transformation, and other methods. In R Studio, normality testing can be conducted using various statistical methods and functions.

One common method for testing normality in R Studio is the Shapiro-Wilk test. Besides this, R Studio also offers various other normality tests, such as the Kolmogorov-Smirnov test, Lilliefors test, and Anderson-Darling test.

These tests yield a p-value, which helps decide whether the residual distribution can be considered normal. A p-value greater than a predetermined significance level (e.g., alpha 0.05) indicates insufficient evidence to reject the null hypothesis (accepting the alternative hypothesis), suggesting that the residuals are normally distributed.

Importing Data from Excel to R Studio

In R Studio, data import from Excel can be done in several ways. The most common method involves using the ‘readxl’ and ‘openxlsx’ commands. To import data from an Excel file in R Studio, you would write:

> library(readxl)

> Regression <- read_excel(“C:/KANDA DATA/Regression.xlsx”)

> View(Regression)

Adjust the file path and Excel file name as per your requirements. Following this tutorial, the imported data will appear in R Studio as shown:

Normality Test in Regression Analysis Using Residuals

Residuals are the differences between observed values and those predicted by the model. In simple linear regression, residuals are calculated by subtracting the observed value from the predicted value. Mathematically, a residual (e) is represented as e = y – ŷ, where y is the actual observed value, and ŷ is the value predicted by the regression model. Residuals are used to evaluate the quality of a model, with smaller residuals indicating a model accurately predicts data.

In regression normality testing, the focus is on the values of the residuals. Therefore, it’s important for researchers to calculate the residuals in linear regression analysis. To find the residual values, first write the following command:

model <- lm(Y ~ X, data = Regression)

In the above command, replace Y and X with the labels of the variables in your R Studio. ‘Regression’ is the name of the Excel file I used. Next, to display the residual values, write:

residuals_model <- residuals(model)

Once you’ve entered the above command, the residuals are ready for normality testing. You can choose any one of the normality tests for linear regression analysis.

Testing for Normality in R Studio

For normality testing using R Studio, here I provide a tutorial using the Shapiro-Wilk and Kolmogorov-Smirnov Tests. For the Shapiro-Wilk test in regression analysis, please write the following command:

shapiro.test(residuals_model)

Next, for the Kolmogorov-Smirnov test, you can write the following command in R Studio:

ks.test(residuals_model, “pnorm”)

Based on the detailed results of the Shapiro-Wilk and Kolmogorov-Smirnov tests using R Studio, the findings are as follows:

The p-value from the normality test can help determine the extent to which the residual distribution approximates a normal distribution. A p-value > 0.05 indicates that there is not enough evidence to reject the null hypothesis, meaning the residual distribution can be considered normal.

Well, this is the article that I can write at this opportunity. Hopefully, it is beneficial and adds new knowledge value for those in need. Stay tuned for updates from Kanda Data next week. Thank you.

Tags: Data Import in R, Excel to R Studio, inferential statistics, Kanda data, Kolmogorov-Smirnov Test, linear regression analysis, normality test, P-Value Interpretation, R Studio, Residual Analysis, Shapiro-Wilk Test, Statistical Methods

Related posts

How to Determine the Minimum Sample Size in Survey Research to Ensure Representativeness

Date Oct 02.2025

Regression Analysis for Binary Categorical Dependent Variables

Date Sep 27.2025

How to Sort Values from Highest to Lowest in Excel

Date Sep 01.2025

Leave a Reply Cancel reply

You must be logged in to post a comment.

Categories

  • Article Publication
  • Assumptions of Linear Regression
  • Comparison Test
  • Correlation Test
  • Data Analysis in R
  • Econometrics
  • Excel Tutorial for Statistics
  • Multiple Linear Regression
  • Nonparametric Statistics
  • Profit Analysis
  • Regression Tutorial using Excel
  • Research Methodology
  • Simple Linear Regression
  • Statistics

Popular Post

October 2025
M T W T F S S
 12345
6789101112
13141516171819
20212223242526
2728293031  
« Sep    
  • How to Determine the Minimum Sample Size in Survey Research to Ensure Representativeness
  • Regression Analysis for Binary Categorical Dependent Variables
  • How to Sort Values from Highest to Lowest in Excel
  • How to Perform Descriptive Statistics in Excel in Under 1 Minute
  • How to Tabulate Data Using Pivot Table for Your Research Results
Copyright KANDA DATA 2025. All Rights Reserved